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Abstract 

This paper describes and analyzes a hierarchical algorithm called Multiscale Gossip for solving the 
distributed average consensus problem in wireless sensor networks. The algorithm proceeds by recursively 
partitioning a given network into subnetworks. Initially, nodes at the finest scale gossip to compute local 
averages. Then, using multi-hop communication and geographic routing to enable gossip between nodes 
that are not directly connected, these local averages are progressively fused up the hierarchy until the 
global average is computed. We show that the proposed hierarchical scheme with k = 6 (log log n) 
levels of hierarchy is competitive with state-of-the-art randomized gossip algorithms in terms of message 
complexity, achieving e-accuracy with high probability after O (n log log n log ^) messages. Key to our 
analysis is the way in which the network is recursively partitioned. We find that the optimal scaling law 
is achieved when subnetworks at scale j contain 0(n'^^/^^^) nodes; then the message complexity at any 
individual scale is O(nlog^). Another important consequence of hierarchical construction is that the 
longest distance over which messages are exchanged is 0(n^/^) hops (at the highest scale), and most 
messages (at lower scales) travel shorter distances. In networks that use link-level acknowledgements, 
this results in less congestion and resource usage by reducing message retransmissions. Simulations 
illustrate that the proposed scheme is more message-efficient than existing state-of-the-art randomized 
gossip algorithms based on averaging along paths. 



I. Introduction 

Distributed signal and information processing applications arise in a variety of contexts including 
w^ireless sensor netw^orks, the smart-grid, large-scale unmanned surveillance, and mobile social netw^orks. 
Large-scale applications demand protocols and algorithms that are robust, fault-tolerant, and scalable. 
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Energy-efficiency is also an increasingly important design factor. When a system is comprised of battery- 
powered nodes or agents equipped with wireless radios for transmission — such as in wireless sensor 
networks — energy-efficiency equates to requiring few transmissions since in addition to consuming band- 
width, each wireless transmission dissipates battery resources. 

Gossip algorithms |[2|, (Si, lH, O, ^ are an attractive paradigm for decentralized, in-network pro- 
cessing, and have received much attention in the computer science, systems and control, information 
theory, and signal processing research communities of late. Gossip algorithms are frequently posed and 
studied as solutions to the distributed averaging problem: in a network of n nodes whose topology is 
described by a graph G = (V, E) with |y| = n, each node initially has a scalar value x^(0), and the goal 
is to approximate the average, Xave = ^ XlILi ^^(0) every node. Nodes iteratively and asynchronously 
exchange estimates with a small subset of the entire network, updating their local estimate after each 
exchange. These protocols have a number of attractive properties. The simplicity of the protocol (exchange 
information, update, repeat) makes it extremely robust; since there is no fixed routing of information to 
a fusion center and since all nodes compute a solution, there is no single point of failure or bottleneck. 
Furthermore, past studies have demonstrated that gossip algorithms converge even under unreliable or 
dynamic networking conditions; see, e.g., [6| and references therein. 

However, the standard gossip algorithms for distributed averaging Q, O, lH, O constrain information 
to only be exchanged between neighboring nodes and exhibit poor scaling and energy-efficiency in 
topologies frequently used to model connectivity in wireless networks, such as grids and random geometric 
graphs Q. Roughly speaking, the number of messages transmitted per node depends linearly on n, the 
size of the network. Since the n values required to compute the average are initially stored at different 
nodes, any distributed averaging algorithm requires that each node perform at least one transmission. 
This discrepancy between constant and linear transmissions per node, has motivated the development of 
a number of variants of gossip algorithms specifically aimed at improving the efficiency of gossip on 
grid and random geometric graph topologies (see Section [ll| for more). 

The principle of hierarchical (multiscale) decomposition, or divide-and-conquer, arises in a variety of 
settings as a mechanism which yields efficient information processing procedures. In the signal processing 
and coding communities, multiscale analysis is frequently associated with wavelet-based methods, e.g., 
for signal and image denoising, edge detection, and transform coding ||8l. A hierarchical approach to 
communication over wireless networks was shown to achieve the optimal capacity scaling law |9|. A 
recent study also found that flocks of birds exhibit hierarchical organization and suggested that hierarchical 
behavior has been selected (in the evolutionary sense) because it is more efficient than democratic or 
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individualistic strategies |10|. 

This paper describes and analyzes a multiscale gossip algorithm for distributed averaging in grid and 
random geometric graph topologies. The network is recursively partitioned into smaller subnetworks. The 
size of subnetworks at each scale and the number of scales of the partition depend on the number of 
nodes in the network. Multiscale gossip operates over this partition in a bottom-up fashion. First, all nodes 
within each subnetwork at the finest scale gossip until computing a suitably accurate local average. Then, 
a representative node is elected for each subnetwork; an overlay grid is formed among all representatives 
within the same subnetwork at the next higher scale, and the representatives gossip over the overlay 
grid. This procedure is repeated until representatives at the coarsest scale have computed an accurate 
approximation to the network average. At that point the representatives disseminate their estimate to all 
of their children in the hierarchy. Multi-hop communication between representatives at coarser scales is 
accomplished using geographic routing ifTTIl . |[T2ll . 

Our main contribution is the analysis of multiscale gossip. In particular, for a carefully designed 
multiscale partition, we show that the total number of single-hop transmissions required to reach a desired 
level of accuracy 1/n scales nearly-linearly, requiring O(nloglognlogn) total transmissions as n ^ oo 
on random geometric graph and grid topologies. Consequently, the average number of transmissions per 
node is 0(log log n log n). Since information dissemination (randomized broadcast) is much more efficient 
than gossip (which is a form of information diffusion) in these topologies, representatives at all scales 
can optionally disseminate intermediate results to other nodes in their subnetwork, thereby improving 
robustness and fault-tolerance of the scheme, without affecting the order-wise scaling law. In contrast to 
geographic gossip with path averaging |[T3]| , a randomized gossip scheme with a linear scaling that also 
uses geographic routing to exchange information over multiple hops, multiscale gossip requires fewer and 
shorter multi-hop transmissions; for example, in a ^/nx^/n grid topology, path averaging requires relaying 
messages over 0(n^/^) hops, whereas multiscale gossip messages at the coarsest scale are relayed over at 
most 0(n^/^) hops, and messages at finer scales travel significantly shorter distances. This has advantages 
when reliable transmission (i.e., handshaking, forward error-correcting, and/or retransmission) protocols 
are used at the link-level to ensure accurate reception over each link of a multi-hop path. Moreover, at 
each iteration of multiscale gossip, information is only exchanged between one pair of nodes, as opposed 
to all nodes along a path. 

The remainder of this paper is organized as follows. Section [Il| covers background, and related work. 



Section III describes the procedure for recursively constructing the hierarchical network partition and for 



carrying out multiscale gossip. Then, our main results are presented in Section IV with analysis and 
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proofs provided in Section |V| A numerical evaluation of the proposed algorithm is presented in Section [VT 



Some practical considerations are discussed in Section |VII[ and we conclude in Section |VIII 



II. Background and Problem Definition 



Our primary measure of performance is communication cost — the number of messages (single hop 
transmissions) required to compute an estimate to e accuracy — which is also considered in ifTTTl . fTSj . 
Moreover, we are interested in characterizing scaling laws, or the rate at which the communication cost 
increases as a function of network size. In the analysis of scaling laws for gossip algorithms, a commonly 
studied measure of convergence rate is the e-averaging time, denoted T^{n) and defined as Q 



which is the number of iterations required to reach an estimate with e accuracy with high probability. The 
e-averaging time Te{n) reflects the idea that the complexity of gossiping on a particular class of network 
topologies should depend both on the final accuracy and the network size. When only neighbouring 
nodes communicate at each iteration, T^{n) and communication cost are identical up to a constant factor. 
Otherwise, communication cost can generally be bounded by the product of T^{n) and a bound on the 
number of messages required per iteration. 

In wireless sensor network applications, random geometric graphs are a typical model for connectivity 
since communication is restricted to nearby nodes. In the 2-dimensional random geometric graph model, 
n nodes are randomly assigned coordinates uniformly in the unit square, and two nodes are connected 
with an edge when their Euclidean distance is less than or equal to a connectivity radius, r(n) Q, |[T4ll . 



In Q it is shown that if the connectivity radius scales as rcon(^) = ®( y "^i^) then the network is 
connected with high probability. Throughout this paper when we refer to a random geometric graph, we 
mean one with the connectivity rcon(^)- 

Although the standard neighbor gossip algorithms are known to be efficient on complete graphs and 
expander-like topologies, they are also known to converge slowly on grids and random geometric graphs, 
two topologies commonly used to model wireless networks O, |[2|. Kempe, Dobra, and Gehrke |[3]| 
initiated the study of scaling laws for gossip algorithms and showed that gossip requires 6(nloge~^) 
total messages to converge on complete graphs. Boyd, Ghosh, Prabhakar, and Shah [2] studied scaling 
laws for standard randomized gossip on random geometric graphs and found that communication cost 
scales as 6(j^^loge~^) messages even if the algorithm is optimized with respect to the topology. 
This finding motivated the pursuit of efficient gossip algorithms for wireless networks in a number of 
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interesting directions. For a complete overview of this line of work, we refer the reader to the recent 
survey ||6]|. Here we briefly discuss different approaches, focusing on advances most closely related to 
the present article. 

A number of approaches seek more efficient computation while enforcing the constraint that information 
only be exchanged between neighboring nodes at each iteration. Most of these approaches introduce 
memory at each node, creating higher-order updates similar to shift-registers or polynomial filters [[TSll , 
|[T6l . Scaling laws for a deterministic, synchronous variant of this approach are presented in ifTTll . leading 
to 6(^^^^ log e~^) communication cost. Related asynchronous gossip algorithms based on lifted Markov 
chains have been proposed that achieve similar scaling laws ifTSl , |[T9ll . Recent work |[20ll suggests that no 
gossip algorithm on grids and random geometric graphs can achieve better than 0(n^-^ loge~^) scaling 
while constraining information exchange to be solely between neighboring nodes. 

A variant called geographic gossip, proposed by Dimakis, Sarwate, and Wainwright HH, achieves 
a communication cost of 9( Jj^^ loge~-^) by allowing distant (non-neighbouring) pairs of nodes to 
gossip at each iteration. Assuming that each node knows its own coordinates and the coordinates of its 
neighbours in the unit square, communication between arbitrary pairs of nodes is made possible using 
greedy geographic routing. Rather than addressing nodes directly, a message is sent to a randomly chosen 
target (x, y) -location, and the recipient of the message is the node closest to that target. To reach the 
target, a message is forwarded from a node to its neighbour who is closest to the target. If a node is 
closer to the target than all of its neighbours, this is the final message recipient. It is shown in iHTIl 
that for random geometric graphs with connectivity radius r(n) = rcon(^), greedy geographic routing 
succeeds with high probability. For an alternative form of greedy geographic routing, which may be 
useful in implementations see |[T2ll . The main contribution of |fTTl| is to illustrate that allowing nodes 
to gossip over multiple hops can lead to significant improvements in message cost. In follow-up work, 
Benezit, Dimakis, Thiran, and Vetterli |[T3]| showed that a modified version of geographic gossip, called 
path averaging, can achieve 6(nloge~^) message cost on random geometric graphs. To do this, all nodes 
along the path from the source to the target participate in a gossip iteration. If geographic routing finds 
a path through nodes S = {x^, . . . to deliver a message from Xi to xj, the estimates of all nodes 
in S are accumulated on the way to xj. Then xj computes the average of all l^l values and sends the 
average back down the same path towards Xi, and all nodes in S update their estimates. 

Observe that there is a tradeoff between algorithmic simplicity and performance. If we only allow 
pairwise communication between neighboring nodes, we cannot beat the 0{n^-^ loge~^) barrier. On the 
other hand, if we have the additional knowledge of geographical information for each node and its 
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immediate neighbours, we can use geographic routing and with the added complexity of averaging over 
paths we can bring the message complexity down to linear at the expense of messages having to travel 
potentially over 0(n^/^) hops. However, in order to improve upon the performance achievable using 
pairwise communication between neighboring nodes, some additional complexity must be introduced. 
In this work, rather than averaging along paths, we propose to decompose computation in a multiscale 
manner in order to achieve faster convergence. 

The multiscale approach considered in this paper also assumes that the nodes know their own and their 
neighbour's coordinates in the unit square. Using the geographic information, we derive a hierarchical 
algorithm that asymptotically achieves a communication cost of 0{n log log n log e~^) messages, which is 
equivalent to that of path averaging up to a logarithmic factor. However, in multiscale gossip, information 
is only exchanged between pairs of nodes, and there is no averaging along paths. At the expense of extra 
complexity for building the logical hierarchy, besides near-optimal communication cost we achieve two 
other important goals. First, the longest distance a message travels in our multiscale approach is 0(n^/^) 
hops which is much shorter compared to 0(n^/^) hops for geographic gossip or path averaging. This can 
prove significant if an adversary wishes to disrupt gossip computation by forcing the network to drop a 
particular message or by deactivating a node in the middle of an iteration. In that scanario a substantial 
amount of information can be lost in path averaging since each iteration involves 0(^y^^) nodes on 
average. Second, as we show later on, multiscale gossip distributes the computation quite evenly across 
the network and does not overwhelm and deplete the nodes located closer to the center of the unit square 
as is the case for path averaging. 

We note that we are not the first to propose gossiping in a multiscale or hierarchical manner. Sarkar 
et al. II2TII describe a hierarchical approach for computing aggregates, including the average. However, 
because their algorithm uses order and duplicate insensitive synopses to estimate the desired aggregate, 
the size of each message exchanged between a pair of nodes must scale with the size of the network. 
Other hierarchical distributed averaging schemes that have been proposed in the literature focus on the 
synchronous form of gossip, and they do not prove scaling laws for communication cost, nor do they 
provide rules for forming the hierarchy (i.e. assume the hierarchical decomposition is given) ll22ll . |[23]| , 
|[24l . Finally, we mention that hierarchical approaches to routing have also been proposed |[25ll , |[26ll . 
Although this line of work uses similar techniques (hierarchy and divide-and-conquer approach) the 
problems considered are not related to distributed averaging. 

A preliminary version of this work appears in the conference paper [11 where multiscale gossip is 
described, including our construction of the multiscale network partition and a simple communication cost 
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Fig. 1. Hierarchical multiscale subdivision of the unit square. At each level, each cell is split into equal numbers of smaller 
cells. Before the representatives of the cells can gossip on a grid graph, we run gossip on each cell. 



analysis. The present manuscript extends (T\ in several ways. In addition to more detailed communication 
cost analysis, the proof of the final error bound of multiscale gossip is a new result. Moreover, we 
investigate the role of the subdivision parameter a, explaining in what sense the value a = | is optimal. 
Finally, we include a thorough set of experiments to evaluate the performance of multiscale gossip. 

III. Multiscale Gossip 

Multiscale gossip performs averaging in a hierarchical manner. At each moment only nodes in the 
same level of hierarchy do computations at a local scale and computation at one level begins after the 
previous level has finished. By hierarchically decomposing the initial graph into subgraphs, we impose 
an order in the computation. As shown in the next section, for a specific decomposition it is possible to 
divide the overall work into a small number of linear sub-problems and thus obtain very close to linear 
complexity in the size of the network. 

Assume we have a random geometric graph G = {V^E) and each node knows its own coordinates in 
the unit square and the locations of its immediate neighbours. Each node also knows the total number of 
nodes n in the network and k, the desired number of hierarchy level^ Figure [l] illustrates an example 
with A: = 3. We use the convention that level k is the lowest level where the unit square is split into 
many small cells. Level 1 is the top level where we only have few big cells. All cells at the same 
level have the same area. The way we split each cell into subcells is directed by a subdivision constant 
a = I whose value is justified in Section |vj If a cell contains n nodes, it is split into n^~^ cells of 



^As explained in Section 



V 



given n, the number of levels k can be computed automatically. 
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a-l 



dimensions x each. At level k the unit square is split into dk cells ^(^^^,1)5 • • • 5 C'(fc,dfc)- Each 
cell ) contains nodes forming a subgraph of G. Each subgraph G(/e,.) runs standard randomized 
gossip until convergence. Then, in each subgraph G(/c,.) we elect one representative node i^(/c,.)- The 



representative selection can be randomized or deterministic as explained in Section |VII[ Generally, not 
all subgraphs have the same number of nodes. For this reason, the value of each representative has to be 
reweighted proportionally to its subgraph's size. At level k — 1, the unit square is split into dk-i cells 
C{k-i,i)i • • • 5 Each cell contains the same number of C(^f^^.^ cells. The representatives 

of the graphs are then organized into logical grid graphs • • • , one grid per 

C{k-i,-) cell. Two representatives i^(iEc,i) and are connected by an edge in a G(^k_i^.^ if cells C(iEc,i) 

and C(^kj) are adjacent and contained in the same (7(^(^-1, •) cell. Note that representatives can determine 
which cells they are adjacent to given the current level of hierarchy and n since the cell construction is 
deterministic. Next, we run randomized gossip simultaneously on all G(^k_i^.^ grid graphs. Finally, we 
select a representative node L(^f._i^.^ for each .) grid graph and continue the next hierarchy level. 

The process is repeated until we reach level 1 at which point we have only one grid graph G(i,i) contained 
in the single cell ^(1,1). By construction coincides with the unit square. Once randomized gossip 

on ^(1,1) OYQY, each node of as a representative i^(2, ) disseminates its final value to all the 

nodes in its C(2,.) cell. 

Algorithm [T] describes multi-scale gossip in a recursive manner. The initial call to the algorithm has 
as arguments, the vector of initial node values (xinit), the unit square (C = [0, 1] x [0, 1]), the network 
size n, the top level q = 1, the desired number of hierarchy levels k and the desired error tolerance e 
to be used by each invocation of randomized gossip. In a down-pass the unit square is split into smaller 
and smaller cells all the way to the C(^k^.) cells. After gossiping in the ) subgraphs in Line 15, the 
representatives adjust their values (Line 16). As explained in the next section, if k is large enough, each 
.) is a complete graph. Since each node knows the locations of its immediate neighbours (needed for 
geographic routing), at level k we can also compute the size of each G(iEc, ) graph which is needed for the 
reweighting. The up-pass begins with the i^(/c,.) representatives forming the .) grid graphs (Line 8) 

and then running gossip in all of them in parallel. Between consecutive levels we use a parameter a = | 
to decide how many .) cells fit in each ^(r, ) cell. As mentioned earlier, the motivation for this 

parameter and its specific value is explained in the sequel. Notice the pseudocode mimics a sequential 
single processor execution which is in line with the analysis that follows in Section |V| However, it should 
be emphasized that the algorithm is intended for and can be implemented in a distributed fashion. The 
notation Xinit{C) or Xinit{L) indicates that we only select the entries of Xinit corresponding to nodes in 
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Algorithm 1 MultiscaleGossip(xi^^t, C, n, A:, e) 

1: a=| 

2: if g < A: then 

3: Split C into m^+i = n^-^ cells: . . . , 

4: Select a representative node L(^+i^) for each cell (7(^+1^^), z G {1, . . . , m^+i} 
5: for all cells do 

6: call MultiscaleGossip(xinit{C(^q^i i^)^ ^ + ^^tol) 

7: end for 

8: Form grid graph of representatives L(^^i^) 

9: call RandomizedGossip(xinit{L(^qj^i^i,r^^^^^),G(^q^.),e) 
10: if q = 1 then 

11: Spread value of i^(2,z) to all nodes in each C{2,i) 
12: end if 

13: else 

14: Form graph only of nodes in V{G) contained in C 

15: call RandomizedGossipi xinit^ ^{k,-) 5 ^) 

16: Reweight representative values as : x(L(/.^^)) = x(L(fe^^)) ^^^^((^^^~^ 

17: end if 



cell C or representatives L. 



IV. Main Results 



Before proceeding with the detailed analysis we present here our main results. Proofs are provided 
in Section |V] below. Section |lll| above describes multiscale gossip, an algorithm for distributed average 



consensus on random geometric graphs which uses randomized gossip as a black box. If each invocation 
of randomized gossip runs up to e accuracy, the total number of messages used by multiscale gossip is 
given in Theorem [T] 

Theorem 1. Let a random geometric graph G of size n and constant e > be given. As the graph size 
n ^ 00, the communication cost of the multiscale gossip scheme described above with scaling constant 
q: = I behaves as follows: 

1) If the number of hierarchy levels k remains fixed as n ^ 00, then the communication cost of 
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multiscale gossip is messages. 
2) If the number of levels grows according to k — O (log log n) as n ^ oo, then the communication 
cost of multiscale gossip is 0(nloglognloge~^) messages. 

Note that e in the theorem above is the target level of relative accuracy used each time we run 
randomized gossip on one overlay network, and not to the level of accuracy of the final average. Errors 
at intermediate levels can accumulate, but this accumulation is not catastrophic. An upper bound on this 
final accuracy is given in Theorem [2] below. 

Theorem 2. Let a random geometric graph G with n nodes and initial values on the nodes x be given. If 
we run multiscale gossip on G using k levels of hierarchy and demand e-accuracy for randomized gossip 
at each subgraph, then with probability at least (1 — e)^, where g is the total number of invocations of 
randomized gossip, the final error is not more than ne i.e., 

\\xfinal - Xav\\ ^ ^2) 

\\x\\ 

where x final ^ denotes the vector of final estimates at each node, and Xav denotes a n-vector whose 
entries are all set to the average of the initial values at each node. 

The bound in Theorem [2] is loose in two senses as explained in the end of its proof; namely, both the 
error bound is loose, and the probability with which the result holds is loose. The recursive partitioning 
scheme produces g = l + + + + ^ = 0{kn) graphs and one invocation of randomized 

gossip for each. We can control both the accuracy and the probability of success by carefully setting 
the value of e, the accuracy used each time we invoke randomized gossip within the overall multiscale 
gossiping procedure. If we want the final accuracy of multiscale gossip to be 5 with high probability, 
we set the required accuracy for each randomized gossip call to e = This will yield final accuracy 
^ which is in fact better than required. Moreover, the probability of achieving this accuracy will be at 
least (1 — > (1 — ^)^^ > 1 — 5 as n ^ oc. The adjustment in e also affects the total 

number of transmissions, as per Theorem[lj Specifically, multiscale gossip requires 0{n log log n log ^) = 
0(nloglognlog ^) = 0(nloglognlog(/cn) + n log log n log |). As we see the transmissions are only 
increased by a logarithmic factor. In particular, letting the number of levels of hierarchy scale as A; = 
6 (log log n) and taking 5 = ^ yields an overall message complexity of O (n log log n log n) . 

Besides the above main theoretical results, we have compared multiscale gossip to path averaging 



which is a recent state-of-the-art linear complexity algorithm. The experiments presented in Section VI 
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suggest that multiscale gossip has superior performance for graphs of up to many thousands or nodes. 
We also include an evaluation in scenarios with unreliable transmissions. 

V. Analysis 

A. Proof of Theorem [7] 

Suppose we run multiscale gossip (Algorithm [T]) on a random geometric graph G = (V^ E) with 
\V\ = n and transmission radius r{n) = \f^^^^' The topmost cell in the partition hierarchy is the 
unit square which we call cell i). We partition C(i i) down to k levels. At the highest level (level 
1), we split the unit square into 1712 = n^~^ cells C(2, ) ^^^h of area ^^1^ = n^~^ and dimensions 
X . Below we exaplain why a = |. In each cell C(2,.) we select a representative node i^(2, )- 
Representative nodes at this level form an overlay grid where logical edges exist between representatives 
of adjacent cells. Messages over logical edges may need multi-hop transmissions since the representatives 
will generally be out of each others range. The partition process repeats recursively within each C(^2,-) 
cell and so on until we reach the bottom level k. 

In general, on a 2-D grid of p nodes, randomized gossip requires 0{p^ loge~^) messages to achieve 
accuracy e with probability 1 — e (e.g. see Q). The grid graph formed by the representatives of 

the C(2,.) cells has n^~^ nodes. By using an appropriately large constant c in the transmission radius 
(e.g. c = 3), the random geometric graph G is geo-dense ll27ll which means that a patch of area n^~^ 
contains 6(n^~^ -n) = 6(n^) nodes with high probability. The maximum distance between two nodes of 
G(i 1) is \/bn~^ = 0{n~^). To see this compute the maximum possible distance between two nodes in 
adjacent C(2, ) cells using the Pythagorean theorem. If we divide by r(n), we get a worst case estimate 
of the cost for multi-hop messages between representatives at level 1: 

/ ^ \ 

MsgCosti = O — , = 0{n^) single hop transmissions. (3) 

Notice that we have ignored the factor of \/c log n thus slighlty overestimating the message cost. We do 
this to simplify the analysis and get a clean expression which allows us to compute the subdivision constant 
a. Knowing the cost of one (mutli-hop) message at level 1 and the size of the grid G(i 1), the total number 
of single-hop transmissions for randomized gossip to converge on will be 0((n^~^)^ loge~^) • 

0{n^) = 0{n^~^ loge"-*^) which is 0{nloge~^) if a = |. 

Next let us look at the cost at the next level where we subdivide the cells C(2,.). This will be instructive 
of how the process goes at any other level but the last. Each cell ^(2, ) contains q = 6(n^) nodes and 
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is subdivided into q^~^' cells C(3,.) containing q^' nodes each (in expectation). Now, using again the 
geodensity property, a cell ^^(3,.) containing q^' nodes will have area ^ and dimensions ^J~^ x ^J~^. 
Each cell C(3 .) also contains a representative node i^(3,.). Following the same logic as before we can 
compute the cost of a message between the i^(3,.) representatives based on the worst possible single hop 
distance of nodes contained in adjacent cells C(3,.). We just divide the cell dimension by r(n) omitting 
the logarithmic factor to get MsgCost2 = 0{q^). 

We have n^~^ grid graphs G(2,.)- With a = | and q = 9(n"), to make the total number of transmissions 
at level 2 linear, we must have n^~^ • 0{{q^~^')'^ loge~^) • 0{q^) = 0(n log e~^) =^ a' = | as well. In 
general, at any intermediate level j the total number of transmissions is the number of grids, times the 
number of messages per grid, times the number of single hop transmissions per message to get between 
neighbouring representative nodes. Based on the above logic and using the same subdivision parameter 
a at all levels, the expression for the cost at some level j is 

n'-^'-'.o( ( fn»^-^V""Vlog-Vo f (w^'-y ] (4) 




which is linear in n if a = |. Finally, we need to treat the last level which is /c. At the last level 
we no longer have grids formed by representatives. Instead, the algorithm runs randomized gossip on 
each subgraph of G with nodes contained inside each of the ^^(/c,.) cells. We have ©(n^^^i)'' ^) cells 
each containing n^i)'' ^ nodes which are close enough to communicate via single hop messages. 
Since we run randomized gossip on each subgraph, the total number of messages at the last level is 
©(n^'^^i)'' Summing up all levels, plus n messages to spread the final result back to all nodes, the 
total number of messages for multiscale gossip is 
l)n + ni+(i)'")log6-i)). 

For the second part of the theorem, observe that at level k each cell contains a subgraph of n^i)^ ^ nodes 
in expectation. For constants m > 2 and M > m, we can choose k so that each cell at the finest scale con- 
tains between m and M nodes with high probability, so that the cost per cell is bounded by log e~^. In 
other words, choose k such that m <inSi^^ < M, implying that k = 9(log log n). Since the cost per cell 
at level k is now bounded by a constant for k = 9 (log log n), the total level k cost is 0{n ^^^^ log e~^) 
and the overall cost is 0(^{{k — l)n + n^~^i^^ ^) loge"-"^) + n) = 0(n log log n log e"-"^), completing the 
proof. 
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B. Proof of Theorem |2] 

To simplify the discussion, we assume that each subgraph at each level has the same number of nodes. 
By geo-density of the random geometric graph lIZTll . the number of nodes in each cell concentrates quickly 
as n grows. The same proof technique can be extended to the general case, at the expense of much more 
cumbersome notation. 

If each subgraph at level i has Li nodes then we have n = Ilj^i ^j- We introduce some notation to 
analyze the procedure or error propagation in its general form. The initial vector of values is (xi, . . . , x^). 
It is convenient to rewrite each element in x as xi^i^^^j^ where 1 < h < Li. We will write the whole 
vector as {xi^i^,„i^} using brackets. We overload our notation to describe the values of nodes that gossip 
at any level. For example at level j we have the node values xij^^^j. where the number of indices is 
indicative of the level. We will use * notation to signify the converged values after gossiping at any level. 
E.g., after gossiping at level j, the node values are transformed to x^*^^ ^ . Moreover, to advance from 
level j + 1 to level j we need to select one node at each subgraph at level j + 1 as a representative 
and promote its value to the next level. This means that x^^^^ .jj ^ ^i^i^ ijc some c in the range 
1 < c < Lj^i. 

Let us also write ^^/^z -i denote the mean of the values in a subgraph at level j i.e., 

^iM-. = z • 

To begin the proof we first state the error bounds for each intermediate subgraph after running 
randomized gossip to e-accuracy. At level j for each subgraph we have 



i \ TTli 

I- I1I2 • • -lj — l^'-Lj J lil2---l 



L, II 

<e. (6) 



\\{Xhh...l,-^l:LJ\\ 

where m signifies a vector with all its element equal to m. Using the definition of the 2-norm, the 
inequality should hold for each summand, and so, 

e\\{xij,...i^_,i:L,}\\ > Ki,...i^_,c-mfx...i,J (7) 
= \xhh...l,.,-mfx...i^J, (8) 

where the last equality follows since x^*;^ ^ — xij^,,j._^. At the top level 1 we have: 

\\{x* }-m^M| 

^-^'^ < e. (9) 
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We can obtain a bound by squaring both sides and using the definition of the norm, and a second 
bound is again obtained by observing that each summand must be less than the right hand side: 

X;(x;-m^O'<e'll{^i:Ljf (10a) 

51 = 1 

K -m^^l <6||{xi:lJ||. (10b) 

Once level 1 is finished, the final values are distributed to all the nodes that each node represents. We 

are interested to bound the following : 

1 1 / * * * * \ — II 

WyX-^, . . . , X]^, . . . , X^^, . . . , Xj^_^) — Xav\\ ^^^^ 

\\x\\ 

where in the above expression each value x^* is repeated Lj. x • • • x L2 times. 

We start by bounding the squared expression for simplicity. Using the definition of the 2-norm, adding 



and subtracting the mean at level 1, expanding the quadratic term, and using the bound ( |10a| ). 



^^-[^5 . . . , ^-[^5 . . . , , . . . , j^av 



Ixlp 



(12) 



^^■■■''^^ht''"- (13) 

ll^ll 

Lk...L2 ZsUi^t, - ^"-^ + ^''^ - ^av? ^^4) 

llxlp 



(15) 



Lk • • ' L2 




- X 


av) EtLl« - + Liim"^^ - Xav? 




\x 


12 



< Lk...L,[^\\{x,..LM'' + Li{m^^ ^^^^ 

~ llxlp 



We arrived at the last inequality after noticing that X]si=i(^si —^^0 ^ 0. The reason is that randomized 
gossip does not change the average (and thus the sum) of the values at any time. So both for the 



initial and the converged values at level 1 we have Xlsi^i ^ Xlsi^i^si- 1^ equation [5 

Et^=i X,, = Lim^^ = Et^=i and so Et=i = Et^=i 

Next we focus on bounding the two parts of the above numerator separately. For details of the derivation 
please see the appendix. The final results are: 



< (17) 



Li{m^^-xa.? ^ Li{k-1)\^. (18) 



\x\ 
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Using ( fTSl ) and ( fTSl ) with ( |10a| ),( [T0bl ) we obtain the overall bound: 

|2 



^-[^ , . . . , ^-[^ , . . . , , . . . , y ^, 



(19) 



< L,...L2 (62||{xi:^Jf + Li(m^^-x.,)2) ^^^^ 



< Lfc...L2(62 + Li(fc-l)V) (21) 

< Lk...L2e^ ^n{k-lfe^. (22) 



Finally, by bounding L]^ . . . L2 < n and (A: — 1)^ < n we get 

II (x-[^ , . . . , , . . . , Xj^^ 5 • • • 5 '^Li ) ~ 



|2 



av 



\X 



12 



(23) 



< ne^ + n^e^ < 2n^e^ (24) 



and so we arrive at the bound 

>k >k >k >k \ I I 

» ' ' nf* ' ' ^ nf* I I 

< v^ne. (25) 



(X]^ , . . . , X]^ , . . . , X^^ 5 • • • 5 j 



This bound will hold whenever all randomized gossip operations at intermediate subgraphs achieve e 
accuracy. Any invocation of randomized gossip achieves e accuracy with probability at least 1 — e and 
all randomized gossip operations are independent of each other. If we have g subgraphs total appearing 
during a run of multiscale gossip, the probability that we achieve final error \/2ne is at least (1 — e)^. 

Notice that this bound is relatively loose. This should be expected given it was obtained using very 
loose bounds for worst case errors at all levels through equations [6] and [TOj Moreover, if the number 



of subgraphs g is large, the final probability of success if low. As explained in section |IV| however, 
we can select an e to control both the final accuracy and the probability of success at the expense of 
logarithmically more transmissions. 

C. Is a — ^ optimal? 

We have selected a = | in the previous sections to get linear cost at each intermediate hierarchy level. 
One could ask whether this is the best we can do. Maybe a different choice of a could yield even smaller 
communication cost. We investigate this question here. As we will see, although a = | is not the unique 
optimal option, it is a well justified choice. 

For convenience in the analysis we change the notation a little bit. In Sections III and [v] we use a = | 



as a rule for subdividing each cell at one level to its subcells. Here, let us assume that we have subdivision 
parameters 61, 62, • • • , ^fc-i with a slightly different meaning. While a is "local" and allows transitioning 
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from one level to the next, bj directly specifies exactly how many cells and nodes in each cell we have 
at level j. Specifically, at level j there will be a total of in}~^^-^ cells and each cell will contain n^^-^ 
nodes and n^-^ leaders communicating over distances of n 2 hops. There is a connections between 
the 6's since, if a is constant for all levels then bj = . We also require that bj > bj-i\ otherwise, there 
would be more nodes in a Cj+i cell than in a Cj cell which is not consistent with our notion of refining 
the hierarchical partition. 

Recall from the complexity analysis that the total number of messages Mj at some level j is 

Mj = {#Cj cells) X (# leaders per Cj)^ x (# hops per message) (26) 

It useful to write the exact expression for the most important cases so: 

• At level 1 we only have one cell so: 

Ml = 1 • {n^~^^f ' = messages (27) 

• At level I < j <k\ 

Mj = n^-^^-^ • {n^^-'-^^f • = n^+^^-^-f^^ messages (28) 

• At last level k all messages can be delivered in one hop so: 

Mk = n^~^^-^ • {rl'^-^f • 1 = messages (29) 

Notice that even if we select all the 6j's so that Mj = 0(n), the fc-th level will dominate with 
superlinear complexity since b^-i > 0. Now, if we take a fixed number of levels k we are interested to 
choose the bjS that minimize the total number of messages J2^=i Mj as n ^ oc. This is equivalent to 
the optimization problem: 

2 2 

minimize5^,...,5,_, max{2 - -61, 1 + 61 - -62, • • • , 1 + bk-i} (30) 

subject to 1 > 61 > 62 (31) 

b2 > bs (32) 

: (33) 

bk-i > 0. (34) 

In general the solution to this problem does not yield a = | at each level. 

As discussed in section [v| to get near linear (e.g. O (n log log n log ^)) complexity we can allow the 
number of levels k to depend on n so that /c(n) ^ oc as n ^ oc. Notice that even if we use enough 
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levels to have a fixed number of nodes at the finest level, we end up with a linear 0{n) number of cells 
at level k and require constant time to gossip in each. As a result the finest level's complexity can be 
linear at best. 

If the number of levels is variable, it depends on the values of the subdivision parameters bj. If the 
bj's are large then each cell contains many nodes and we need to use many levels until we reach the 
finest level. If on the other hand bj's are small, we create a lot of small cells and few levels. If the 
number of cells it too large however, we can no longer have 0(n) messages at an intermediate level. 
Consequently, we need to have as few levels as possible. This principle, together with the desire to have 
as few messages as possible can justify the selection of a = | at each level. To see this, let us demand 
that Mj = 0(n). For the first level this is true if 2 — |6i < 1 which gives 6i > |. Similarly for any 
other level we need bj > f^j-i- Obviously the smallest possible bjs that are still large enough to admit 
linear complexity at each level are such that bj = (|)^- This is the same as using the same a = | to 
subdivide each cell at each level. 

VI. Experimental Evaluation 

In this section we evaluate multiscale gossip in simulation and study its behaviour in practical scenarios. 
First we investigate the effect of using few versus many levels. Then we show that multiscale gossip 
performs very well against path averaging |[T3]| , the current state-of-the-art gossip algorithm that requires 
linear number messages in the size of the network to converge to the average with e accuracy. Finally, we 
investigate scenarios where transmissions do not always succeed and messages are either retransmitted 
or lost. 

A. Varying levels of Hierarchy 

In the analysis we concluded that we can select the number of levels k = log log n i.e. we don't need 
too many levels. This can be verified in practice. Figure [2] investigates the effect of increasing the levels 
of hierarchy. The figure shows the number of messages until convergence within 0.0001 error, averaged 
over ten graphs of 5000 nodes. More levels yield a diminishing reward and we do not need more than 
4 or 5 levels. As discussed in the next subsection this observation led us to try a scheme with only two 
levels of hierarchy which still produces an efficient algorithm. 

B. Mutliscale Gossip vs Path Averaging 

We compare multiscale gossip against path averaging |[T3]| which is in theory the fastest algorithm 
for gossiping on random geometric graphs. Is it worth emphasizing that both algorithms operate under 
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Fig. 2. Increasing the levels of hierarchy yields a diminishing reward. Results are averages over 10 random geometric graphs 
with 5000 nodes and final accuracy e — 0.0001. All graphs are created with radius r — ■ 



the same two assumptions. First, each nodes needs to know the coordinates of itself and its neighbours 
on the unit square. Second, each node needs to know the size of the network n. In path averaging this 
is impHcit since each message needs to be routed back to the source through the same path. It is thus 
necessary that nodes have global unique ids which is equivalent to knowing the maximum id and thus 
the size of the network. In multi scale gossip, the network size is used for each node to determine its 
role in the logical hierarchy and also decide the number of hierarchy levels. 

Figure [3] shows the number of messages needed to converge within e = 0.0001 error for graphs of sizes 
500 to 8000. The bottom curve tagged MultiscaleGossip shows the ideal case where computation inside 
each cell stops automatically when the desired accuracy is reached. The curve labeled MultiscaleGossipFI 
was generated using fixed number of iterations per level based on worst case graph sizes and the curve 
labeled MultiscaleGossipllevel was generated using only two levels of hierarchy and an a = | instead of 
|. Both of these variants are explained below. For path averaging we also simulate the ideal scenario where 
nodes stop transmitting automatically when achieving the targeted accuracy. As we see all variants of 
multiscale gossip use noticeably fewer transmissions than path averaging. One reason why path averaging 
seems to be slower than in |13| is because we use a smaller connectivity radius for our graphs (r = 
/3M^ instead of r = \^^). 

Figure |4] depicts the cumulative density functions of transmissions for multiscale gossip and path 
averaging. Specifically, we plot the fraction of nodes that transmitted t times or less for a random 
geometric graph with 2000 nodes. Both path averaging and multiscale gossip were stopped as soon as the 
desired error level is reached: — x^^H < 0.0001 ||x(0)||. As we see, the node with most transmissions 
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Fig. 3. Comparison of multiscale gossip to path averaging on random geometric graphs of increasing sizes. MultiscaleGossip 
used with 5 levels of hierarchy. MultiscaleGossipFI is the version using a fixed number of iterations for gossiping at a specific 
level. MultiscaleGossip21evel is a version using only two levels of Hierarchy and is explained in Section [VIl| Results are averages 
over 20 random geometric graphs with 8000 nodes and final accuracy e = 0.0001 reached on all runs. All graphs are created 
with radius r(n) = yj^^- 




MultiscaleGossip 
PathAveraging 



200 400 600 800 

Number of messages 



Fig. 4. Cumulative density function of the probability that a node sends less of equal to t messages as a function of t on 
a random geometric graph with 2000 nodes. The node with maximum number of transmissions for multiscale gossip has less 
transmissions that about 22% of the nodes in path averaging. 



in multiscale gossip still sends fewer messages than about 22% of the nodes in path averaging. 

Multiscale gossip has several advantages over Path Averaging. All the information relies on pairwise 
messages. In contrast, averaging over paths of length more than two has two main disadvantages. First, if 
a message is lost, a large number of nodes (potentially 0(^j^|^)) are affected by the information loss. 
Second, when messages are sent to a remote location over many hops, they increase in size as the message 
body accumulates the information of all the intermediate nodes. Besides being variable, the message size 
now depends on the length of the path and ultimately on the network size. Our messages are always 
of constant size and independent of the hop distance or network size. Moreover, the maximum number 
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of hops any message has to travel is 0(77.3) at worsj^ This should be compared to distance 0{^/n) 
which is necessary for path averaging to achieve linear scaling. Finally, multiscale gossip is relatively 
easy to analyze and implement using standard randomized gossip as a building block for the averaging 
computations. 

Fixed Number of Iterations per Level: The ideal scenario for multiscale gossip is if computation inside 
each cell stops automatically when the desired accuracy is reached. This way no messages are wasted. 
However in practice for cells at the same level may need to gossip on graphs of different sizes that 
take different numbers of messages to converge. This creates a need for node synchronization so that 
all computation in one level is finished before the next level can begin. To alleviate the synchronization 
issue, we can fix the number of randomized gossip iterations per level so that all computation between 
different subgraphs at the same level takes practically the same amount of time. However, we need to 
be careful not to perform fewer iterations than needed for the desired accuracy. Given that nodes are 
deployed uniformly at random in the unit square, we can make a worst case estimate of how many nodes 
are expected to be in a cell of a certain area. Since by construction all cells at the same level have equal 
area, we gossip on all graphs at that level for a fixed number of iterations. Moreover, as seen in the 
previous section, we can use enough levels of hierarchy to only have m < k < M nodes at the last level. 
This can ensure that we will not do less iterations that necessary. In practice, usually at level k, we have 
less nodes than expected so we end up wasting messages running gossip for longer than necessary. 

Two-level Gossip: multiscale gossip is a synchronized algorithm where computation in one level begins 
after the previous level has converged. Synchronization can be complicated or inefficient if we have too 
many levels. This motivates trying an algorithm with only two levels. In this case, for graphs of size 
a few thousand nodes, splitting the unit square into n^~^ cells with a = | is not a good choice as 
it produces a very small grid of representatives and quite large level- 1 cells. To achieve better load 
balancing between the two levels, we use a = ^. This choice has the advantage that the maximum 
number of hops any message has to travel is 0{n^). To see this is true, observe that each C(2, ) cell 
has area ^ = n~2 = n~4 x n~4. Thus the maximum distance between representatives is 0{n~^). If 
we divide by the connecting radius r(n) = \f^^^ we get the result. Another interesting finding is that 
for moderate sized graphs, using cells of area n~2 produces subgraphs which are very well connected. 
Since nodes are deployed uniformly at random, an area n~2 is expected to contain nodes. A subgraph 
inside a C(2, ) cell is still a random geometric graph with t = nodes, but for which the radius used to 

^At level 1 the distance in hops between leaders is at worst O(n^) = 0{ni) for a = |. 
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connect nodes is not \J ^1^. It is ^J~^^^' This is equivalent to creating a random geometric graph of t 
nodes in the unit square but with a scaled up radius of rt = ^ c\ogn ^ From ||27]| we know that a random 
geometric graph of t nodes is rapidly mixing (i.e. linear number of messages for convergence) if the 
connecting radius is Vrapid = poiyliogt) ' ^ow, e.g. for c = 3 and n < 9 * 10^, we get n > > Vrapid 
for t = ^/n < 3000. Consequently, the C(2, ) cells are rapidly mixing for networks of less than a few 
millions of nodes. Figure |3] verifies this analysis. For graphs from 500 to 8000 nodes and final error 
0.0001, we see that MultiscaleGossipllevel performs very close to multiscale gossip with more levels of 
hierarchy and better than path averaging. 

C. Operating under Transmission Failures 

As explained in the previous section, multiscale gossip needs to send messages to shorter distances 
across the network than Path Averaging. It is important to see what effect this has on the robustness of the 
algorithms against transmission failures. Two different but general scenarios are considered. In the first 
scenario, no message is truly lost. There is a non-zero probability that a message over a network edge is 
not sent successfully, but the nodes communicate via a hand shake mechanism so messages are eventually 
delivered after a number of attempts. If the probability of successful transmission is p, then the cost for a 
single message transmission over an edge is geometrically distributed : P[Cost = m] = Geo{p^ m). The 
second scenario is more extreme. Each message is delivered with probability p over each edge, otherwise 
it is lost. This model has severe consequences. Depending on where in its path the message is lost, a 
number of nodes will not update their values properly so besides the overall delay in convergence, part 
of the signal energy is lost and the final estimate of the average is no longer guaranteed to be close to 
the true average. 

1 ) Hand Shake Model: In this scenario we don not have to worry about convergence. All messages 
are eventually delivered and it is just a matter of time. Figure |5] shows the results of multiscale gossip 
against path averaging on networks of different sizes. The probability of successful transmission also 
varies from p = 0.5 to p = 1. As we see multiscale gossip is significantly less affected by such failures. 
This example clearly illustrates the importance of not having to send messages in long distances of the 
network. Since each individual link introduces some delay, the fact that messages in multiscale gossip 
usually don't need to travel far and need to go 0(77.3) hops at most, allow the algorithm to converge 
using much fewer messages than path averaging. 

2) Message Loss Model: In this scenario, a message is delivered with probability p or lost forever. 
This severely impedes the algorithms from converging fast. Moreover, information is lost permanently 
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Fig. 5. Multiscale gossip and Path averaging on graphs of varius sizes using hand shakes to overcome transmission failures. 
The probabihty of successful transmission p varies from 0.5 to 1. All results are averaged over 25 graphs of each size. Final 



distorting the final result and making it impossible to meet the desired final accuracy. The amount of 
distortion depends on where along a multi-hop path the failure occurs. For example in path averaging, if 
the message is lost at the first transmission on its way back, all the nodes along the path except the last will 
have distorted information. Similar situations occur in multiscale gossip between leader communications 
where messages need to travel across multiple hops. Since in this scenario we have no guarantee that the 
desired final accuracy criterion will be met, it is hard to draw conclusions whether multiscale gossip is 
better than path averaging. Our observations showed that both algorithms can only reach an accuracy in 
the order of 0.01 when targeting at e = 0.0001. Specifically, multiscale gossip would only reach up to 
0.06 accuracy while path averaging could not improve beyond 0.02. At the same time the total number 
of messages still increases linearly for multiscale gossip while it seem to blow up exponentially for path 
averaging. 



There is a number of practical considerations that we would like to bring to the reader's attention. We 
list them in the form of questions below: 

How can we detect convergence in a subgraph or cluster? Do the nodes need to be synchronized? 

At each hierarchy level, representatives know how big the grid that they are gossiping over is (function 
of n and k only). Moreover, all grids at the same level are of the same size and we have tight bounds 
on the number of messages needed to obtain e accuracy on grids w.h.p. We can thus gossip on all grids 
for a fixed number or rounds and synchronization is implicit. At level k however, in general we need to 
gossip on random geometric subgraphs which are not of exactly the same size. As n gets large though. 



accuracy is e = 0.0001 and the random geometric graphs use r{n) — 




VII. Practical Considerations 
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random geometric graphs tend to become regular and uniformly spaced on the unit square. Therefore, 
the subgraphs contained in cells at level k all have sizes very close to the expected value of n^i)^. Thus, 
we run gossip for a fixed number of rounds using the theoretical bound for graphs of the size n^s)'". As 



discussed in Section |III| fixing the number of iterations leads to redundant transmissions, however the 
algorithm is still very efficient. 

What happens with disconnected subgraphs or grids due to empty grid cells? This is possible 
since the division of the unit square into grid cells does not mean that each cell is guaranteed to contain 
any nodes of the initial graph. Representatives use multi-hop communication and connected grids can 
always be constructed as long as the initial random geometric graph is connected. At level k the subgraphs 
of the initial graph contained in each cell could still be disconnected if edges that go outside the cell are 
not allowed. However, as explained in Section |V| we can use enough hierarchy levels so that each ) 
cell is a complete graph and the probability of getting disconnected ) tends to zero. 

How can we select representatives in a natural way? The easiest solution is to pick the point pc that 
is geographically at the center of each cell. Again, knowledge of n, k uniquely identifies the position of 
each cell and also pc. By sending all messages to pc, geographic routing will deliver them to the unique 
node that is closest to that location w.h.p. To change representatives, we can deterministically pick a 
location Pc + u which will cause a new node to be the closest to that location. A more sophisticated 
solution would be to employ a randomized auction mechanism. Each node in a cell generates a random 
number and the largest number is the representative. Once a new message enters a cell, the nodes knowing 
their neighbours' values, route the message to the cell representative. Notice that determining cell leaders 
this way does not incur more than linear cost. 

Are representatives bottlenecks and single points of failure? This is not an issue. There might 
be a small imbalance in the amount of work done by each node, but it can be alleviated by selecting 
different representatives at each hierarchy level. Moreover, for increased robustness, at a linear cost we 
can disseminate the representative's values to all the nodes in its cell. This way if a representative dies, 
another node in the cell can take its place. The new representative will have a value very similar (within 
e) to that of the initial representative at the beginning of the computation at the current level. Thus node 
failure is expected to only cause small delay in convergence at that level. We should emphasize however 
that the effect of node failures has received little attention so far and still asks for a more systematic 
investigation. 

How much extra energy do the representatives need to spend? This question is difficult to answer 
analytically. We use simulation to get a feel for it. Figure [6] shows the number of messages sent by each 

February 29, 2012 DRAFT 



24 



TABLE I 

Mean and standard deviation of number of transmissions for different types of nodes running 
multiscale gossip on a random geometric graph with 5000 nodes with 5 levels of hierarchy (see also 

FIGURE [6]). 



Node type 


Mean #msg 


Std 


Three times representatives 


58.63 


27.05 


Two times representatives 


31.23 


21.11 


One time representatives 


18.77 


15.21 


Never representatives 


8,65 


10.23 


All nodes 


16.85 


16.67 



Representatives three times 
Representatives twice 



Never representatives 




2000 3000 
Node ID 
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Fig. 6. Node utilization on a random geometric graphs with 5000 nodes and final desired accuracy e = 0.0001. For each node 
the total number of sent messages is shown. Nodes 1 to 30 have been representatives three times. Nodes 31 to 662 have been 
representatives two times. Nodes 663 to 3157 have been representatives once. The rest of the nodes never had a representative 
role. All nodes may have participated as intermediates in multi-hop communication. 



of the 5000 nodes in a random geometric graph. For this case we used five levels of hierarchy. We expect 
that some nodes will transmit more messages since, as we move down the hierarchy, cells get smaller 
and there are fewer nodes from which to draw representatives. In this example, by randomly selecting 
representatives at each level, no node was a representative more than 3 times. We show the number of 



transmissions for nodes of each type in table |VII[ including messages relayed by intermediate nodes 
using geographic routing. As we see, most of the nodes use very few messages. Moreover, the average 
degree of this particular example is 26, and thus, on average each node sends fewer messages than it has 
neighbors. 

Figure |7] illustrates the typical fraction of transmissions per node as a function of location in the unit 
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Fig. 7. Comparison of communication load between multiscale gossip (left) and path averaging (right). Color intensity is 
proportional to amount of transmissions through that part of the unit square. More blue (lower intensity) means little traffic while 
more red (higher intensity) means heavy traffic. Multiscale gossip distributes traffic relatively uniformly on all nodes while in 
path averaging the nodes at the center handle much more traffic than nodes around the perimeter. Results are averages over 100 
graphs of 1000 nodes each. 

square normalized as a probability distribution. Each pixel is assigned an intensity proportional to the 
number of messages that nodes located in that region will typically transmit. Thus this figure reveals 
which nodes suffer from the heaviest traffic. The figure is the result of averaging over 100 realizations 
of gossip for each of 100 different 1000-node random geometric graphs. As the figure shows, multiscale 
gossip tends to distribute traffic almost uniformly in all the nodes. The observed color pattern is consistent 
with the hierarchical nature of the algorithm and although nodes that become representatives send more 
messages, no node is heavily used. On the other hand, for path averaging we observe that the nodes at the 
center (red region) send many more messages than nodes around the perimeter. This should be expected 
since geographic routing (which path averaging relies on) is greedy when trying to deliver messages to 
remote locations across the network. 



We have presented a new algorithm for distributed averaging exploiting hierarchical computation. 
Multiscale gossip separates the computation in linear phases and achieves very close to linear complexity 
overall. The key to achieving nearly-linear scaling lies in the way the recursive network partition is 



VIII. Discussion and Future Work 
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constructed. In particular, we argue that refining the network so that subnetworks at scale j contain 
0(n(^/^)^) nodes provides optimal message-scaling laws with minimal number of levels in the hierarchy; 
although other hierarchical partitioning schemes could be constructed to achieve nearly-linear message 
complexity, they would require deeper hierarchies and, consequently, additional overhead for management. 
Our analysis focused on network topologies modeled as random geometric graphs, but it should be clear 
that the results translate directly to grids (2-dimensional lattices) embedded in the unit square. 

Another feature of the proposed scheme is that the maximum distance any message has to travel is 
0(n3) hops, which is shorter than 0{y^n/ log(n)) hops needed by path averaging ifTSll , where each 
iteration involves averaging along a path of nodes potentially spanning the diameter of the network. 
Requiring transmissions over shorter distances is advantageous when transmitting over unreliable links 
that use acknowledgements and retransmission to ensure reliable communication at the link-layer, as 
is common practice in many existing systems. Intuitively, shorter paths translates directly to fewer 
retransmissions, and we illustrate this via simulation. 

There is a number of interesting future directions that we see. In our present algorithm, gossip at 
higher levels happens on overlay grids which are known to require a number of messages which scales 
quadratically in the size of the grid. Since these grids already use multi-hop communication, it may be 
possible to further increase performance by devising other overlay graphs between representatives with 
better convergence properties, i.e. expander graphs |[28]| . Moreover, the subdivision of the unit square into 
a grid cell is not necessarily natural with respect to the topology of the graph, and one could use other 
methods for constructing hierarchical partitions which are tuned to the network topology. Our preliminary 
results with using hierarchical spectral clustering appear promising in simulation. It is, however, not clear 
how to carry out spectral clustering in a decentralized manner way and in linear number of messages. 
Another possibility is to combine the multiscale approach with the use of more memory at each node 
to get faster mixing rates. Notice however that how to use memory to provably accelerate asynchronous 
gossip is still an open question. Current results only consider synchronous algorithms ifTTll . Finally, an 
important advantage of gossip algorithms in general is their robustness. However, the general question of 
modeling and reacting to node failures has not been formally investigated in the literature. It would be 
very interesting to introduce failures and see the effect on performance for different gossip algorithms. 

Appendix A 

Complementary derivations for multi-scale gossip error bound 
Looking at inequality ([12]), we have three terms that we need to bound. 
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• 7i = ^ ^^^ll^xf^^'^^^ • bound the numerator if we consider the following; At some level j we 
have Xs,...s, = xl^...s,c,+^ < ^^^i<ij+i<Lj+,{xs,...s,i,+J for some 1 < c^-+i < L^-+i. We can keep 
bounding all the way down to level k in the exact same fashion since Xg, si,. ■ ^ < 
maxi<i^^2<L,+2(^5i...5,i,+ii,+2) for 1 < ^^+2 < and so on. If we do this for all terms appearing 
in the vector {xi-^^}, we get a numerator as a norm of elements from the initial x-vector. Since we 
need to divide expression by ||x||, the ratio is less than one simply since the denominator has more 
terms. This means Ti < e^. 

• T2 = ^^^^11 : Using definition ([5]) and the fact that an average such as Xav can be written as 



the average of averages, 



(35) 



\\x\\ \\x\\ \ Li L1L2 • • • Lk I 

Pulling and the summation over Li out, adding and subtracting the L2 mean, taking the absolute 
value and using triangular inequality 

\\x\\Li^^y L2L3---Lk J 

Using bound ([6]) 



^1 _ ^ 1 1 i^i 



— X, 



av 



||x|| - ||x||Li ^ I ' L2L3---L, 

Si = l \ 



Using definition (|5]) 



||x|| \\x\\Li^^y L2 L2Ls---Lk J 



Pulling the summation and the term outside 



^^^^ ^ Ay- E + f El--- - ^"^V"a / • (40) 
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We repeatedly pull out terms and add and subtract means to reach 

- Xnv . 1 1 



< 



\\x\\ \\x\\ L 



■■■+L^^ E + - ^--=^^^"-" I) ■ . 

A^E MIKa:Ljll + ;^E(^ll{^«^^^i:^3}ll + --- (43) 

\ S2 = l 



■■■ + T^ E (e|IK.-.-a:.JII) + 0)... 



^^-1 . 1 

Sk-l = l 



(44) 



The comment from bounding Ti helps us here as well. Each term in the numerator can be bounded 
by terms from the initial vector and dividing by ||x|| we get 

Si=l S2 = l. 

This is true since each summation cancels out with the corresponding denominator leaving only an 
e term and we have k — 1 such terms. Now obviously T2 < Li(k — l)^e^ 
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